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Abstract 

We investigate the effect of incomplete information on the growth 
process of scale- free networks - a situation that occurs frequently e.g. in 
real existing citation networks. Two models are proposed and solved 
analytically for the scaling behavior of the connectivity distribution. 
These models show a varying scaling exponent with respect to the 
model parameters but no break-down of scaling thus introducing the 
first models of scale-free networks in an environment of incomplete 
information. We compare to results from computer simulations which 
show a very good agreement. 

Keywords: Random graphs, networks. Socio-economic networks. Stochas- 
tic processes. Growth processes 

1 Introduction 

Since the work on complex networks by Strogatz, Watts, Barabasi and Albert 
(see [HEIISIIIIEIISIIZIIH]) many researchers from such distinct fields as 
statistical mechanics CHI EH lEl CB] , molecular biology [H [El HEl CZl CHI 
Cni, ecology (20j, physical chemistry 121 123, genetics e.g. 1211123123123 
I2H] or social science [23 IB3 ISIl have studied the emerging complex structure 
and the behavior of networks in their respective field of research. 

A special subset of scale-free non-equilibrium networks can emerge from 
a construction procedure in which at each time step t one vertex is added 
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and connected to m existing vertices with preferential linking. This prefer- 
ence is proportional to the number of already existing connections of that 
particular vertex. By definition the average number of connections remains 
constant k = 2m. The distribution of the degree of connections P{k) is of 
particular interest as it provides for the possibility to distinguish different 
classes. One observes in scale-free-networks a behavior P{k) ~ k''' that was 
first discussed by Simon and in the context of citations networks by Price 
[331. Extensions to a more complex linking procedure [31] or more general 
linkage properties [33] were recently discussed in detail. 

To study the evolution of the distribution the continuum approximation 
is often used. At time t the average number of connections k{s, t) of a vertex 
created at time s is in an undirected network |Hl EE] 

dk{s,t) k{s,t) 

/ du k{u, t) 



Bianconi and Barabasi j2Zl pointed to the effects of distributions of fitness 
of individuals to attract new connections. This can be already regarded 
as one prototypical example of incomplete information by interpreting the 
fitness in their model as an incomplete knowledge of all newer vertices about 
the individual properties or existence of the present vertices. 

Mossa et al. [SB] showed that the power-law behavior might be truncated 
due to information filtering. In their newly attached vertex is only 

aware of a certain subset of the existing vertices. This subset is however 
chosen randomly for each vertex individually. Therefore the incomplete in- 
formation has no global properties but is instead a local property. 

Here we want however to follow another route with two distinct models 
to deal with the more interesting case that the incomplete information is 
attached to the new vertices individually and still global with respect to the 
whole network. One model will mimic a generic and global effect that is 
present in all real citation networks while the other describes the influence 
of individual information unawareness. 

2 Growing nets and latency 

For a newly created vertex the only relevant information is a list of existing 
vertices to connect to and their respective degrees. Incomplete information 
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results in the ignorance of some of those vertices. This effect is here mediated 
via an 'awareness' function 0(s, t) that makes the newly connected vertex t 
aware of the existing vertex s. 



(p{s,t) : 



1 s is known to t 
else 



Eq. (P) becomes then 



dk{s,t) 
dt 



= m ■ 



J du k{u, t)(j){u, t) 





t 



k{s, t)(j){s, t) 



(2) 



While there are many choices for 0(s, t) we will propose one particular 
structure to resemble actual effects in citation networks. We will further set 
m = 1 as this network property has no influence on the scaling exponent 7 
in the models. 

A newly created link might not be aware of the most recent created 
entities in the network. One encounters this situation actually very often: the 
author of a new WWW-page cannot be aware of other recently created pages 
that he would like to link to. Search engines do not provide for instantaneous 
listing of just created pages and therefore authors can find new pages just by 
chance. 

This incomplete information about the vertices of the network results 
in a selection of 'older' vertices for linking. We model this by the setting 
0<(s, t) := 9 (c • t — s) with some constant < c < 1 and 9(x) the Heaviside- 
step-function. Therefore a newly created vertex is only aware of the oldest c 
fraction of the existing vertices. 

While this seems to be at a first glance to strong of an assumption, this is 
realized in exponentially growing systems like the WWW: suppose from the 
current real-time r we can not know new pages younger than some period Tc- 
Then all existing pages that are capable of attracting a link are taken from the 
interval [0; r — r^]. As the number of pages obeys however ~ exp (a ■ r) this 
translates in the A^-notion to [0; c • A^ (r)]. Here we have set c := exp (—arc). 
Recall that the time t of eq. is actually the number of vertices so that we 
actually draw a vertex from [0; c • t]. 

Rewriting the definition of 0<(s,t) we get 0<(s,t) = 6 (c — s/t) and see 
that this function scales with respect to s/t. We can then set x = s/t and 



3 



rewrite equation (j2)) to (with k{s,t) = k{x)) 



-x—^ = - — ^ ^^^^ ^ — = ; ^^^^ ^ = /3-/t(x)0<(x) with k(1) = 1 



(3) 

/3 ^ is the integral in the denominator jSH]. The boundary condition = 1 
reflects the fact, that upon creation a new node s = t has only one connection. 
The solution of this differential equation is 



K,[XI 



c ■ x~f^ < t < c 
const = 1 t > c 



As the function has to be continuous we must have k{1)=k{c). We can derive 
the value for c and arrive at 



k{x) 



The equation for P is then 




-0 



0<t<c 
t>c 



1 



c 



k{x) dx = <^==^ P 



p J ' ' 1-P ' 1 + c 







By scaling arguments one can prove that in general the relation 7 = 
1 + holds [36, 33 EOj. Using this we conclude that here 7 = 2 + c. This 
is depicted in figure which compares this result with computer simulations. 

We can further analyze the emerging networks by investigating the shortest- 
paths in these networks. Figure |21 shows the increase of both the average lavg 
and the maximum l^ax of the shortest path lengths j3T] in independently 
created networks with increasing c above c ~ 0.2. 

A smaller c G [0.2; 1.0] reduces the number of available vertices in the 
growth process and the network gets more dense: the shortest path lengths 
get smaller and the probability of vertices with larger number of connections 
bigger - as can be seen from the previous result 7 = 2 + c. 

For very small c < 0.2 we see however an increase in lavg and Imax- This 
is an artifact due to our initial setting of the first 1/c vertices to form a 
chain. While the chain guarantees that none of the first vertices is preferred 
to another initial vertex, the path lengths are largely influenced now. For 
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Figure 1: a) For c = 0.4 with N=10^ and sampled over 100 replica we 
obtain a good fit to N{k) = N ■ p{k) ~ k~'^ with 'jfu ~ 2.37 in the interval 
log(fc) G [2.2; 5.5]. The fitted curve was shifted for clarity, b) The exponent 
in the scaling law P{k) ~ k~'^ as a function of the information awareness c 
in simulated networks. The straight line is the function 7 = 2 + c. For each 
point 600 independent networks of 10, 000 vertices each were sampled and 
the best fit was used. The error bars stem from the fitting procedure of the 
cumulative distribution [36]. 
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Figure 2: The average and the maximum of the minimal path lengths in 
the network grown with an information awareness of c. For every point 
we sampled over 100 independent networks. The first 1/c vertices where 
initialized to form a chain, so giving no preferences to any of them. 
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the smallest c = 0.025 used here we get an l^ax = 40.99 ^ 1/c. Here the 
chain length dominates the path lengths. 

Suppose that WWW-pages are the vertices in this scenario and links are 
the edges of the network. This model then takes into account the time that 
e.g. search engines need to encounter new web-pages and make the general 
public aware of those sites. The anonymity of a vertex s is healed over time by 
sliding into the focus of new vertices as soon as s < c ■ t. Here the incomplete 
information refers to the knowledge of every new vertex uniformly. 

There is also the opposite scenario in which some vertices are aware of the 
full information (that is the number of connections all the existing vertices 
possess) and others are just ignorant and connect with equal probability to 
any of the existing ones. Here the incompleteness of information is restricted 
to a subset of individuals: 



3 Growing nets and partial ignorance 

Suppose that a newly added vertex is with some probability p aware of all the 
connectivities of the other vertices. In this case it is attached with preferential 
linking described above. With probability 1 — p it is connected without 
preference. We want to deduce the effect on the connectivity distribution 
from the master equation for the average number of connections of degree k 
at time t 



N{k,t + 1) 



p ■ 



N{k,t) + 



k 



-J^N{k-l,t)-tN{k,t) 



N{k,t) + jN{k - - jN{k,t) 



+ h,i (4) 



Here A; = 2 is the average degree of each vertex. The first term describes the 
preferential linking with its in- and outflow while the second term provides 
for the additional connections or loss thereof with equal probability. Notice 
that there are currently t vertices in the network, so 1/t is the probability of 
hitting any one of those. The third term is finally responsible for the newly 
added 'guy'. By changing to continuous time we get from eq. (H)) 



d_ 

dt 



{t-P{k,t)) 



p- 



p 



P{k-l,t) 



Py + ^ - p 
k 



P{k-l,t)+6k,i 
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where P{k,t) = N{k,t)/t is the density of vertices with degree k at time t. 
We can now solve for the stationary distribution for t — oo. We arrive at 
the recursion 

pk-p + k-pk^,^ ^, k 
^ ^ pk + 2k-pk ^ ' pk + 2k-pk ' 

This is further written as 

P(A;) = , , (5) 



Using the relationship 

T{k + a) 



T{k + b) 



1 + 0(A;^i) 



for large k jl21 we conclude that 7 = 1 + ^ for large k with a divergence for 
the exponent when approaching p = as in this case we have no preferential 
linking at all. In this case the starting master equation leads correctly 
to the Poisson-distribution Pp=o{k) = 2~^. The diverging behavior of the 
exponent was for instance also found by Krapivsky and Redner ^] in their 
treatment of growing networks with redirection. Figure 01 shows the results 
from computer experiments for this model. The smaller the p the more 
difficult it is to see any indication of the power-law. 



4 Conclusion 

In this paper we developed two distinct models to describe the effect of 1) 
global incomplete information caused by penetration rates while constructing 
a citation network and 2) local incomplete information of individual vertices 
that are attached with a probability of 'non- knowledge'. We derived the 
scaling behavior of the degree distribution for large degrees in both cases 
and compared this to computer experiments. Both models approach the 
analytic value jl] of 7 = 3 when reaching full information. The incomplete 
information in the two models does not destroy the scale-free-behavior of the 
systems while Mossa et al. [38j found a cross-over from scale-free-behavior to 
an exponential in another model which takes information into account. By 



8 



. L = 


10,000 


. L = 


5,000,000 


o L = 


7,500,000 


X L = 


10,000,000 




Figure 3: The cumulative number of connections Ncum{k) = J2 N{k') of 

k'=k 

degree k in networks of size L averaged over 500 independent runs. The 
data was shifted for a better overview. The straight hne through the data 
of L = 10, 000 is the derived result of eq. (0) with p = OA while the bro- 
ken line indicates an asymptotic power-law for the cumulative number of a 
distribution with •y = 1 + 2/p = 6. 
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comparison one can see the influence incomplete information may have on 
the global structure of growing networks. 

We will work out particulars on real-world-networks and the influence of 
incomplete information in a forthcoming study. 
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